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Abstract. We introduce an automata model for data words, that is words that 
carry at each position a symbol from a finite alphabet and a value from an un- 
bounded data domain. The model is (semantically) a restriction of data automata, 
introduced by Bojanczyk, et. al. in 2006, therefore it is called weak data automata. 
It is strictly less expressive than data automata and the expressive power is in- 
comparable with register automata. The expressive power of weak data automata 
corresponds exactly to existential monadic second order logic with successor 4-1 and 
data value equality ~, EMS0^(-|-1, ~). It follows from previous work, David, et. al. 
in 2010, that the nonemptiness problem for weak data automata can be decided in 
2-NEXPTIME. Furthermore, we study weak Biichi automata on data oj-strings. They 
can be characterized by the extension of EMS0^(-|-1, ~) with existential quantifiers 
for infinite sets. Finally, the same complexity bound for its nonemptiness problem 
is established by a nondeterministic polynomial time reduction to the nonemptiness 
problem of weak data automata. 



1 Introduction 

Motivated by challenges in XML reasoning and infinite-state Model Checking, an extension 
of strings and finitely labelled trees by data values has been investigated in recent years. 
In classical automata theory, a string is a sequence of positions that carry a symbol from 
some finite alphabet. In a nutshell, data strings generalize strings, in that every position 
additionally carries a data value from some infinite domain. In the same way, data trees 
generalize (finitely) labelled trees. In XML Theory, data trees model XML documents. 
Here, the data values can be used to represent attribute values or text content. Both, 
cannot be adequately modelled by a finite alphabet. In a Model Checking scenario, the 
data values can be used, e.g., to represent process id's or other data. 

Early investigations in this area usually considered strings over an "infinite alphabet" , 
that is, each position only have a value, but no finite-alphabet symbol |2ll9l7ll4ll5ll7j . 
Many of the automata models and logics that have been studied for data strings and trees 
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lack the usual nice decidability properties of automata over finite alphabets, unless strong 
restrictions are imposed |10I4I3I1| . 

A result that is particularly interesting for our investigations is the decidability of the 
satisfiability problem for two- variable logic over data strings [1]. Here, as usual, the logical 
quantifiers range over the positions of the data string and it can be checked whether a 
position X carries a symbol a (written: a(x)), whether it is to the left of a position y 
(x + 1 = y), whether x is somewhere to the left oi y [x < y) and whether x and y carry 
the same data value (x ^ y). The logic is denoted by F0^(+1, The rcsuh was shown 

with the help of a newly introduced automata model for data words, data automata (DA). 
It turned out, that the expressive power of these automata can be actually characterized 
by the extension of F0^(+1,<,'^) with existential quantification over sets (of positions) 
and an additional predicate that holds for x and y if y is the next position from x with the 
same data value. 

However, the complexity of the decision procedure for F0^(-|-1, <, ~) is very high. The 
problem is equivalent to the Reachability problem for Petri nets jl2j . a notoriously hard 
problem whose complexity has not been resolved exactly. Thus, it has been investigated how 
the complexity can be reduced, by dropping one of the predicates a; < y or x-l- 1 = y. In the 
latter case (that is, for F0^(<, ^)) the complexity decreases to NEXPTIME [4]. In the former 
case (F0^(-|-1, ^)) the complexity also becomes elementary. In a 3-NEXPTIME bound 
was shown for the case of data trees and this bound clearly carries over to data strings. A 
more direct proof with a 4-NEXPTIME bound was given in and a 2-NEXPTIME bound 
was obtained in [18] . 

The high complexity of the satisfiability of F0^(+1, <, ^) in [4] results from the high 
complexity of the nonemptincss problem for data automata. One of the starting questions 
for this paper was: 

(1) Is there a natural restriction of data automata with (i) a better complexity and (ii) a 
correspondence to F0^(+1,^)? 

We show that such a restriction indeed exists. Data automata consist of two automata A 
and i3. „4 is a non-deterministic letter-to-letter transducer that constructs, given the finite 
alphabet part of the input data string m, a new data string w (where, for each position, the 
data value in w is the same as in u). The second automaton B can then check properties 
of the subsequences of w that carry the same data value. We define weak data automata 
(WD A ) which also use a non-deterministic letter-to-letter transducer but can only test some 
simple constraints of the subsequences in the second part. These constraints are (unary) 
key, inclusion and denial constraints and they are evaluated for each class separately (there 
are no inter-class constraints). 

It turns out that WDA are expressively weaker than data automata, incomparable 
with register automata jl4|lj and that their expressiveness can be precisely characterized 
by the extension of F0^(-|-1,~) by existential set quantification, that is, EMSO^(-f 1, ~). 
As the property that we use to separate the expressive power of WDA and DA can be 
defined in EMS0^(+1, <, -) we get that EMSO^(-Hl,-) ^ EMS02(+1, <, -) as opposed 
to the classical setting (without data values) where EMSO^(-l-l) = EMS0^(-|-1, <). Indeed, 
one of the benefits of the logical characterization is that it gives an easy means to show 
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non-expressibility for EMS0'^(+1, '~) (and F0^(+1, ^)). From results in [5] it immediately 
follows that the nonemptiness problem for WDA can be solved in 2-NEXPTIME. 

As mentioned above, one motivation to study data strings comes from Model Checking. 
In that context, systems are usually considered to run forever and to produce infinite traces. 
Thus, data cj-words need to be considered as well, and this was actually one of the main 
motivations of this research. In particular we address the following questions. 

(2) Do the complexity results of [8] carry over to data w-strings? 

(3) Can the expressibility results and logical characterizations of the first part of the paper 
also be established for data w-strings? 

It is straightforward to adapt weak data automata for data w-strings. The transducer can 
simply be equipped with a Biichi acceptance mechanism. We refer to the resulting model 
as weak Biichi data automata (WBDA). It turns out that the answer to both questions, 
(2) and (3), is affirmative. For (3), this is not hard to prove. The separation of WDA 
from DA also separates WBDA from Biichi data automata. It is also not too hard to 
get a logical characterization of WBDA by extending EMS0^(+1, ^) with existential set 
quantifiers that are semantically restricted to bind to infinite sets. The answer to question 
(2) required considerably more effort. However, we establish a 2-NEXPTIME upper bound 
for the nonemptiness problem for WBDAs by a nondeterministic polynomial time reduction 
to the nonemptiness for WDA. 

Related work. Some related work was already mentioned above. The pioneering works in 
Linear Temporal Logic for w- words with data are the papers [10l9j . In an extension of 
Linear Temporal Logic (LTL) to handle data values is proposed and its satisfiability problem 
is shown to be decidable. The decision procedure is a reduction to the reachability problem 
in Petri nets, thus resulting in a similarly unknown complexity as for data automata. The 
logic and automata considered in [TU] are decidable for finite data words, but not primitive 
recursive, and undecidable for w-words. In [16] it is shown that with a safety restriction 
both the logic and the automata become decidable, even in EXPSPACE. In [9j a logic with 
PSPACE complexity is considered. In |5j, MSO logic on data words (with possibly multiple 
data values per position) is compared to automata models for various types of successor 
relations. 

Organization. We give basic definitions in Section [5) In Section |31 weak data automata 
are defined, their complexity is given, and their expressive power is compared with other 
models. Section [4] gives the logical characterization of WDA by EMS0^(+1, '^). Section[5] 
studies data w-strings and shows how the nonemptiness problem of WBDA can be nonde- 
tcrministically reduced in polynomial time to the nonemptiness of WDA. Section [6] states 
some open problems. . 

Acknowledgement. We thank Christof Loding for helpful remarks on automata and logics 
for cj-words and Thomas Zeume for thorough proof reading. 

2 Notation 

Data words. Let S he & finite alphabet and 53 an infinite set of data values. A finite word 
is an element of S* , while an w-word is an element of . A finite data word is an element 



of {S X D)* , while a data uj-word is an element of {S x S))". We often refer to data words 
also as data strings. 

We write a data (finite or oj-) word w as (j^Jidl)'''' where ai,a2,... G and 
di,d2T-- G 2). The symbol is the label of position i, while the value di is the data 
value of position i. The projection of w to the alphabet S is denoted by Str(u') = 0102 . . .. 
A position in w is called an a-position, if the label of that position is a. Wc denote by 
Vw{a), the set of data values found in a-positions in w, i.e., Ko(a) — {di \ = a}, for each 
a (1 S. Note that some T4j(a)'s may be infinite, while some others finite. 

A maximal set of positions with the same data value d is called a class c'^ of the word and 
the Z'-string induced by the symbols at its positions is called the class string w'^. The profile 
w;or(i of a data w-word w (^^) (^^) • • ■ is Profile(w) = (oi, si), (02, S2), • • ■ e {Sx{T, -L})'^, 
where for each position i > I the component Si is T if and only if di = d^+i. The profile 
word of a finite data word (^j) (^^) • • • (^") is defined similarly, with the addition that the 
component s„ is _L. 

Automata and Biichi automata. An automaton A over the alphabet is a tuple A = 
{S, Q, qo, A, F), where Q is a finite set of states, qo & Q is the initial state, A C Q x S x Q 
is a set of transitions and C Q is a set of accepting states. A run of ^ on a word 
w = 0102 ... a„ is a sequence p = . . . q„ of states from Q — {qo} such that (go, Oi, qi) G A 
and {qi, fli+i, qi+i) G A for each 1 < i < n. The run p is accepting, if q„ G F. 

A Biichi automaton A is syntactically just an automaton. A run of A on an oj-word w = 
aia2 ... is an infinite sequence p = qiq2 ... of states from Q — {qo} such that (go, oi, 91) G A 
and (gi, fli+i, G Z\, for each i > 1. Let Inf(p) denote the set of states that appear 
infinitely many times in p. The run p is accepting if Inf(p) Ci F ^ ^. 

A word (resp. an w-word) w is accepted by an automaton (resp. Biichi automaton) A, 
if there exists an accepting run of A on w. As usual, C{A) (resp. C^{A)) denotes the set 
of words (resp. w-words) accepted by the automaton A. 

Letter-to-letter transducers. A letter-to-letter transducer over the input alphabet S and the 
output alphabet T is a tuple T = {S, F, Q, qo, A, F), where Q, qo, F are the set of states, 
the initial state, and the set of accepting states, respectively, and Z\C(5xZ'x(3x_r 
is the set of transitions. The intuitive meaning of a transition (q,a,q' is that when the 
automaton is in state q, reading the symbol a, then it can move to the state g' and output 7. 
A run of T on a word w = 0102 . . . a„ is a sequence (gi, 71), . . . , (g„, 7„) over (Q — {go}) x r 
such that (go, fli, gi, 71) G A and (g^, a^+i, g^+i, 7i-(_i) G A, for each 1 < i < n. Likewise, a 
run of T on an w-word w — aia2 ... is a sequence (gi, 71), (g2, 72), ■ • ■ over {Q — {go}) x F 
such that (go, ai, gi, 71) G A and (g^, a^+i, g^+i, 7^+1) G A, for each z > 1. A run is accepting 
if it is accepting in the sense of (Biichi) automata. We say that u = 7172 ... is an output 
of T on w, if there exists an accepting run (gi, 71), (g2, 72), ... of T on w. 

Data automata. A data automaton (DA) is a pair [A,B), where 

— ^ is a letter-to-letter transducer with input alphabet S x {T, _L} and output alphabet 

r, 

— i3 is a finite state automaton over the alphabet F . 

A data word w is accepted by [A, B) if the following holds. 



— Profile(w) is accepted by A, yielding an output u. 

— For each data value d of w, the class string u'^ is accepted by B. 

Data automata were introduced in the stated form in [3] . In [T] it was shown that their 
expressive power is not affected, if A gets Str(w) as input as opposed to Profile(w;). In more 
recent papers, data automata are therefore defined in the (syntactically) weaker form with 
input Str(z/;). 

3 Weak data automata 

In this section we define a new automata model for finite data words and study its expres- 
sive power and its complexity. The model follows a similar approach as the model of data 
automata. The profile of the input data word is transformed by a letter-to-letter transducer 
and then further conditions on the resulting class strings are imposed. However, the con- 
ditions that can be stated in the new automata model are much more limited than those 
of a data automaton (hence the name weak data automata) . 

Let r be an alphabet. Weak data automata allow three kinds of data constraints over 

r: 

1. key constraints^ written in the form: key (7), where 7 G -T. 

2. inclusion constraints, written in the form: V{'y) C [J^,^j^V{'j'), where j G F, R C F. 

3. denial constraints, written in the form: 1^(7) H ^^(7') = 0, where 7,7' £ F. 

Whether a data word w satisfies a data constraint C, written as w \== C, is defined as 
follows. 

1. w \^ key (7), if every two 7-positions in w have different data values. 

2. w^ y(7) C UYeRni'), if VUl) C Uy6flK,(7')- 

3. v{j) n ^(7') = 0, if vUl) n v^iY) = 0. 

If C is a collection of data constraints, then we write w ^ C, if w |= C for all C € C. 
A weak data automaton (WD A) over the alphabet is a pair {A,C), where 

— ^ is a letter-to-letter transducer with input alphabet S x {T, _L} and output alphabet 

— C is a collection of data constraints over the alphabet F. 

A data word w (^^i) ■ ■ ■ {'^"J is accepted by a WDA {A, C), if 

— there is an accepting run of A on Profile(?i'), with an output 7172 . . . 7n, and 

— the induced data word w = (^j) (2^) • • ■ (^") satisfies all the constraints in C. 

We write C{A, C) to denote the language that consists of all data words accepted by {A, C). 
We first discuss some extensions of WDA by the constraints that were studied in [8] . 

^ Disjunctive key constraints are written in the form: key(/"ir), where K C F. Such a 
constraint is satisfied by a data word if each of its classes has at most one position with 
a symbol from K. 



— Disjunctive inclusion constraints are written in the form: [J^izs^i'l) ^ [J'y'eR^il')j 
where S,R C 7^. Such a constraint is satisfied by a data word if each class with a 
position with a symbol from 5* also has a position with a symbol from R. 

An extended weak data automaton is defined like a WDA but it further allows disjunctive 
key and inclusion constraints. 

Lemma 1. From each extended WDA {A,C) an equivalent WDA of polynomial size can 
be constructed in polynomial time. 

Proof. A disjunctive inclusion constraint IJ7£S ^(7) — [J-y'eB ^(t') '■^^^ simply be replaced 
by a set of inclusion constraints V{'y) C IJ^/g^ ^il'): one for each j G S. Disjunctive key 
constraints key{K) can be replaced by a set of denial constraints 1^(7) n ^^(7') = 0, one for 
each pair 7 7^ 7' with 7, 7' G K and a set of key constraints key(7), one for each 7 G X. □ 

Next, we compare the expressive power of weak data automata with other automata 
models for data words. More precisely we compare it with register automata |14ll] and 
data automata. Register automata are an extension of finite state automata with a fixed 
number of registers in which they can store data values and compare them with the data 
value of subsequent positions. For a precise definition we refci0 the reader to [I]. 

We consider the following two data languages. 

— La<b consists of all data words over the alphabet {a, b} with the property that for every 
a-position i there is a 6-position j > i with the same data value; 

— La*b is the subset of La<b where the next ^-position j with the same data value as i 
always satisfies j = i + 2. 

Lemma 2. Neither La*b nor La<b can be decided by a WDA. 

Proof. We first show that no WDA decides La*b- Towards a contradiction, we thus assume 
that La*b is decided by some weak data automata {A,C). 

To this end, let n = + 1 and let di, d[, d2, d2, . . . dn, d'^ be pairwise different data 
values. We consider the data word 

di) G'J (di) GO G2) (d'J G2) G2) ■ ' ■ GJ G'J (< 

of length 4n. Clearly, w is in La*b and its profile is ((a, -L)(a, -L)(&, -L)(&, -L))"- 

Let 7 = 7172 • • • 74„ be an output of A on the profile of w such that (2^) • • • ('^■',") satisfies 
all constraints in C. By the choice of n, there exist numbers i,j with < i < j < n such 

that 74i+l744+2744+3744+4 ~ 74j + l74j+274i+374j+4- 

Let u be the data word obtained from w by swapping the positions of the data values 
<^i+ic?i+i and That is, 

" = (di) ■ ■ • G.+i) (rf^+i) G.+i) ■ ■ ■ U+i) G;+i) G^+i) ■ ■ ■ (d J ■ 



^ The precursor model finite-memory automata was defined on "strings over infinite alphabets" , 
that is, essentially data strings without a X'-component [14] . 



Clearly, u ^ La*b- However, because Profile(M) = Profile(w), 7172 . . . 74„ is also an output 
of A on Profile(u). Moreover, the sets of ^1(7) = ^^,(7), for each 76/^, and therefore the 
validity of inclusion and denial constraints does not change. Furthermore, as in u and w 
every data value occurs at exactly one a-position and at exactly one 6-position, they cannot 
be distinguished by key constraints, either. Thus, u G C{A,C), the desired contradiction. 
The proof for La<cb is exactly the same, as w 6 ia<6 a-nd u ^ La^b (because of (^ ^ )). 

□ 

Theorem 1. (a) The class of data languages that are decided by WDA is strictly included 

in the class of data languages decided by DA. 
(b ) The classes of languages decided by WDA and by register automata are incomparable. 

Proof. Towards (a) we first show that every WDA can be translated into a DA and thus 
WDA decide a subclass of DA. That the subclass is strict can then be concluded from (b) 
as register automata are captured by DA [T] and thus there is a data language that can be 
decided by a DA but not a WDA. 

Let thus {A,C) be a WDA. Then {A,B) is a data automaton for L{A,C), where the 
automaton B tests the constraints in C as follows. 

— For every key constraint key(7) of C, B tests that every class string has at most one 
7-position. 

— For every inclusion constraint ^^(7) C IJ^'efl ^(7')i ^ tests that every class string with 
a 7-position also has a 7'-position, for some 7' G R. 

— For every denial constraint V{j) fl ^^(7') = 0, S checks that classes with a 7-position 
do not have any 7'-positions. 

To show statement (b) wc first consider the separation language L ~ La*b which cannot 
be decided by a WDA by Lemma [2] However, La*b can be easily decided by a register 
automaton that always stores the last two data values in two registers and the information 
about their symbols in its state. 

On the other hand, the set of all data strings over S = {a} in which every data value 
occurs only once can easily be decided by a WDA by the identity-transducer and the key 
constraint key (a) but not by a register automaton |14j . □ 

The complexity of the nonemptiness problem for WDA follows directly from results in 

i- 

Theorem 2. The nonemptiness problem for WDA is decidable in 2-NEXPTIME. 

Proof. In [5] , it was shown that given an automaton A that reads profile strings and a set C 
of disjunctive key and inclusion constraints, to decide whether there is a data word w such 
that A accepts Profile(w) and w \= C can be done in nondeterministic double exponential 
time. 

Clearly, this is basically the same as the nonemptiness problem for WDA with disjunc- 
tive key and inclusion constraints only. It thus only remains to show that denial constraints 
can be translated into disjunctive constraints in a nonemptiness respecting fashion. To this 
end, a denial constraint T^(7i) H ^^(72) = can be replaced as follows. We add two new 
symbols 7^ , 72 and require that in each class with 7^ one 7^' occurs but 7J and 72 do not 
co-occur by two inclusion constraints 1^(71) C ^(7^) and ^(72) C ^^(72) and a disjunctive 
key constraint for {7^, 72}. □ 



4 A logical characterization of weak data automata 



In this section, wc give a logical characterization of the data languages decided by weak data 
automata in terms of existential second order logic. The characterization is an analogue of 
the Theorem of Biichi, Elgot and Trakhtenbrot |6llll21j for string languages. This theorem 
can be stated for various logics, the most interesting one for our context is that EMS0^(+1) 
characterizes exactly the regular languages. 

We represent data words by logical structures w ~ ({1, . . . , n}, +1, <, {a(-)}ag^, ~), 
where {1, . . . , n} is the set of positions, +1 is the successor relation (i.e., j) if i+l = j), 
< is the order relation (i.e., < ii i < j), the a(-)'s are the label relations, and i ~ j 
holds if positions i and j have the same data value. As the empty data word can not be 
properly represented, the logical characterization of WDA ignores empty data words. That 
is, if some WDA {A,C) accepts the empty data string then its language is different from 
the language of the corresponding formula ip: C{A,C) = L{ip) U {e}. 

For a set SS C {+1,<,'^} of relation symbols, we write F0(5'S') for first-order logic 
with the vocabulary SS, MSO{SS) for monadic second-order logic (which extends fO{SS) 
with quantification over sets of positions), and EMSO(S'S') for existential monadic second 
order logic, that is, all sentences of the form 3Ri . . . 3Rm ip, where ip is an F0(55) formula 
extended with the unary predicates _Ri, . . . ,Rm- By fO^ (SS) we denote the restriction of 
fO{SS) to sentences with two variables x and y, and by EMS0^(55) the restriction of 
EMS0(55) where the first-order part uses only two variables. 

4.1 From weak data automata to EMS0^(+1,'~) 

Theorem 3. For every weak data automaton {A,C), an equivalent EMSd^{+l, ^)-formula 
if is constructible in polynomial time. 

Proof. Let {A,C) be a weak data automaton with A = {S,r,Q,qo,A,F), where Q = 
{qi, . . . , g„} and F = {71, . . . , 7;}. We recall that we assume without loss of generality that 
A uses go only its initial state. 

We will construct an EMS0^(+1, ■~)-formula (p with L{A,C) - {e} L{(p). The con- 
struction is the same as the classical translation from NFAs to MSO formulas. See, for 
example, pO] . 

The formula ip is 

(f = 3Rq^ . . . 3Rq^ 3R.y^ . . . 3R^^ {(fpart A ^ start A ^trans A ^accept A ^ constr) 

where 

^ 'fpart asserts in a straightforward manner that Rq-^ Rq„ on one hand and i?^^ , . . . , R-y, 

on the other hand, partition the positions of the input word. 
^ f start asserts that the automaton starts in state qg: 



\/x{-^3y X — y + 1 ^ (p) 



where 6 is: 



Vy y = x+l^ f\ 



/ (a(x) A X - y) -> \J (Rqix) A R^{x)))\ 

(<?o,(a>T),<},7)GZi 

A 

V (<?o,(a>J-),9,7)64 / 



'fitrans asscrts that transitions are simulated correctly: 
VxVy y = x+1 — > 

/ (a(y) A i?q(a:;) A 3x{x ^y + 1 Ay x)) 



A 



A 



V Rq,iy)AR^iy)\ 

(g,(a,T),g',7)eZi 



(a(y)Ai?,(a;) A3x(a:; = 2; + lA2/7^a:;)) ^ \/ Rq.{y)AR^{y) 



— (paccept states the accepting condition: 

yx[-^3yy^x + l^ \/ 

gG-F 

— ipconstr is a Conjunction AcgC over all constraints C from C such that, 

• if C is a key constraint key (7), then 

Tpc = yxyy[{Ry{x) A R^{y) A X ^ y) ^ X ^ y] 

• if C is an inclusion constraint 1^(7) C IJ^/g^ ^(7')' then 

V-c = -ix3y[R^{x) -^iy Ryiy)Ax^ y)] 
I'es 

• if C is a denial constraint ^(7) H ^(7') — 0, then 

i^c = VxVy[(i?^(a;) A i?7'(y)) ^ a; 9^ y] 

The length of ip is 0(1^71 |(3||Z\| + |C|). The correctness is straightforward and, thus, omitted. 

□ 



4.2 From EMS0^(+1,~) to weak data automata 

In the following; we use the abbreviation F(x, y) for the formula -ly = x + 1 A -^x = 
y + I A X ^ y, which states that the distance of x and y is at least two. 

Theorem 4. There is an algorithm that translates every EMSO^ (+1, ^)-formula ip into 
an equivalent weak data automaton {A, C) in doubly exponential time. In particular, the 
output alphabet F of A and the number of constraints in C is at most exponenotial. 



Proof. In the first step, the algorithm transforms if into an equivalent EMS0''(+1, ~) 
formula in Scott normal form (SNF) of the form 

m 

7/; = 3i?i . . . 3Rn [VxVy x' A /\ Vx3y x'i\, 

i=l 

where x' f-nd each Xi ^re quantifier- free [13j . The size of ip is linear in the size of f, in 
particular, n = 0{\if\) and m = ©([(^j). 

Then it rewrites formula %' into an, at most exponential, conjunction 

X = A ^("j (2;) ^ f^iiy) ^ y) ^ y))' 

i 

where, for every j, aj,f3j are conjunctions of literals with unary relation symbols, Sj is 
either x y ov x '/^ y and ej(a;, y) is on^ of a; = y, j/ = x + 1, y). 
Likewise, it rewrites every x'i into an, at most exponential, disjunction 

X^=\J{a]{x)^P]{y)^5]{x,y)^e]{x,y)), 
j 

where the atomic formulas are of the respective forms as above. 

The idea of the construction is that A guessed a couple of relations that allow to state 
some of the properties expressed in by constraints of C. 

For simplicity we refer to the label relations ai, . . . , a/ as . . . , Rn+i- 

The relations that are guessed are the following. 

— i?i , . . . , Rn (the SNF relations) . We refer to the full atomic type of a position with 
respect to the relations . . . , i?„+; as its SNF-type; 

— Pi , P2 , ^3 with the following intention: if a class contains at least three positions of some 
SNF-type a, then one of them is in P3. If a class contains at least two a-positions, then 
one of them is in P2 . If there is at least one a-position then there is one in Pi . 

— Ci , C2 , C3 with the following intention: if there are at least three classes that contain 
positions of some SNF-type a, then in one of these classes all a-positions are in C3. 
If there are at least two classes that contain a-positions, then in one of them all a- 
positions are in C2 ■ If there is at least one class with a-positions then in one of them 
all a-positions are in Ci. We refer to the full atomic type of a position with respect to 
Pi, P2, P3, Ci, C2, C3 as its occurrence type; 

— , -E-j. with the intention that a position is in if its left neighbor has the same 
data value (and likewise for E^); 

^ Pf , . . . , Ri+i,P!, Pi Pi, ci, ci, ci and R\,..., Rl+i,Pl, Pi,P^, C[, Cl,Cl with the 
following intention: For each position p, it should hold that p is in a relation with 
superscript if its left neighbor is in the corresponding relation without superscript. 
Likewise, p is in a relation with superscript r, if its right neighbor is in the corresponding 
relation without superscript. We refer to the type of a position with respect to these 
relation and the relations P<_,P_>. as its neighborhood type; 

^ The case x — y +1 does not need to be considered as it can be obtained by swapping x and y. 
^ More precisely, A guesses, for each position p, the set of those relations that contain p. However, 
on a global level, we refer to this as "guessing the relations" . 



— for every j < n+l and i < m the relations Wj and E\ G],^^, G^^g, Gp with the following 
intention: if for some position p, formula Xi becomes true for x = p and some position 
y = q then p e W- if and only if g e Rj. That is, W^, . . . , mimics the SNF-type of 
witness positions with respect to Xi- Furthermore, p q if and only iipGE^,q=p+l 
if and only if p G G*^^, p = q + I if and only if p G G^^g, and F{p, q) if and only if 
p G G^. We refer to the type with respect to these relations as the witness type. 

It should be noted that the number of these relations is 0{nm). 

Now we describe how A and C can be constructed in order to test whether a data string 
w satisfies if. 

First, the automaton A guesses which types will occur in the output and verifies during 
its computation that exactly these types occur. What needs to verified is how A and C can 
ensure that the relations guessed by A are consistent with respect to the intention that 
was described above. 

— The consistency with respect to Pi, P2 and P3 can be tested as follows. A can ensure 
that each position is in at most one Pk . The intention of Pi , P2 and P3 can be enforced 
by the the inclusion constraints 

• V{a A -Pi A -P2) C V{a A P3), 

• via A P3) C V{a A P2), and 

• V{aAP2) C F(aAPi). 

That each class contains at most one position with a A P^, for each fc, can be stated 
by key constraints. 

— The consistency with respect to Gi, G2 and G3 can be tested in a similar fashion. That, 
for some a the existence of an (a A G3)-class implies the existence of an {a A G2)-class 
and the other two corresponding conditions can be already guaranteed when A guesses 
the set of occurring types. That all a-positions in an (a A Gfc)-class are in Ck can be 
ensured by denial constraints. 

— The consistency of the neighborhood types can be easily tested by A with the help of 
the profile information. 

— For the consistency of the witness types A checks that for each position p and each 
i <m there actually exists a disjunct (a*(a;)A/3*(y)A(5*(a;,y)Ae*(a;,y)) of Xi, for which 
ttj is the SNF-type of p and (3j, (5* and coincide with the i-th witness type of p. How 
the existence of corresponding witness positions is tested will be described below. 

Now we describe how the conjuncts of x can be checked. For every conjunct (indexed by j) 
we distinguish the following cases depending on the possible formulas 6j(x,y) and eix^y). 

(Case 1) e is a; = y: in this case the conjunct states a condition about forbidden SNF- 
types of positions. This kind of constraints can be ensured by A by not allowing 
to guess them. 

(Case 2) e is y = a; + 1: such conjuncts state that some pairs of SNF-types are forbidden for 
neighbors with equal (or different) data values. As this is a question of consistency 
between the neighborhood type and the witness type of a position it can be 
guaranteed by A (by disallowing certain combinations). 

(Case 3) e is F{x,y) and (5 is a; ~ y: such a formula states that there should not be an 
a-position p and a /3-position q in the same class with |p — g| > 1. Such a formula 



gives rise to some denial constraints. As an example, if the SNF-type of a position 
p is a and the neighborhood type of p indicates that it has a (Pi A /3)-position 
and a (P2 A /3)-position as neighbors then there is a denial constraint forbidding 
(P3 A /3)-positions in this class. 
(Case 4) e is F[x, y) and 5 is x '/^ y: such a formula states that a-positions p and /3-positions 
q with |p — (/I > 1 need to be in the same class. This can be tested by A and some 
denial constraints with the help of Ci, C2, C3 and Pi, P2, Ps- 

What remains to be shown is how the formulas Xi can be tested. For each position p and 
each i < m, {A, C) has to check that there is a witness position q such that p and q satisfy 
some disjunct of Xi- Which disjunct should be considered is given by the i-th witness type 
of p. The existence of corresponding positions can be tested in a way that is similar to the 
tests for formula x- If e is 2; = y the witness needs to be p itself which can be tested by A 
directly. If e is y = a; + 1 or .t = j/ + 1 the witness is one of the neighbors of p. Whether this 
is true can be concluded from the neighborhood type and thus also by A. If e is P(x, y) 
the existence of witnesses can be checked by inclusion constraints ii 5 la x ^ y and by A 
directly if (5 is x 7^ y. In the former case, the P^^-relations are used, in the latter case, both 
the Pfe and the Cfe-positions. 

The size of the output alphabet of A and the number of constraints are at most expo- 
nential in \ip\. The number of states is at most doubly exponential. □ 

We note that in the upper bound of the algorithm for noncmptincss of WDA transferred 
from [8], the doubly exponential term only depends on the alphabet size. By combing this 
with the bounds of Theorem [4] we obtain a 3-NEXPTIME upper bound for satisfiability of 
F0^(+1,~) (which is worse than the bound in [H]). We also note that the construction 
underlying the proof of Theorem |4] can be turned into a nondeterministic exponential 
time reduction from satisfiability for F0^(+1, ^) to noncmptincss for WDA resulting in an 
automaton with a singly exponential number of states. The reduction guesses the order in 
which types appear in the accepted string (as opposed to the construction in the proof of 
Theorem [4]). 

The previous two theorems yield the following logical characterization of WDA. 

Theorem 5. Weak data automata and EMSd^{+l, '~) are equivalent in expressive power. 

We note that on strings EMS0^(+1) and EMS0^(+1, <) are expressively equivalent. It 
is an interesting consequence of the above characterization that this equivalence does not 
hold for data strings. 

Corollary 1. The logic EMSd^{+l,^) is strictly less expressible than EMSCP{+1,<,^). 
Proof. The inclusion holds by definition. It is strict because 

— the language La<6 cannot be decided by a WDA (Lemma[2l) and thus cannot be defined 
in EMS0^(+1,-), 

— but it can be expressed by the simple formula yx3y{a{x) — ?• {b{y) A x < y A x ^ y)). 

a 



5 Weak Biichi data automata 



In this section we consider automata and logics for data uj-words, that is, data words of 
infinite length. Weak data automata {A, C) can easily be adapted for data w-words. The 
automaton A is simply interpreted as a letter-to-letter Biichi transducer. A run is accepting 
if it visits infinitely often a state from F. We refer to the resulting model as weak Biichi data 
automata (WBDA). We write C"{A, C) for the set of data w-words accepted by {A, C). The 
results regarding expressive power of WDA compared with other automata models easily 
carry over to WBDA. 

Data w-words can be represented by logical structures 

w = {N,+l,<,{a{-)}aeE,-), (1) 

where N is the set {1,2,...} of natural numbers which represent the positions and the other 
relations are as in the case of data words. For a set SS C {+1, <, ~} of relation symbols 
Etx3MS0(S'iS') consists of all formulas of the form 

3oo-Ri • • ■ 3oo-Rm3S'i . . . 3S£ (p, (2) 

where ip € fO^ (SS). Here all relation symbols Ri,Si are unary. The 3oo are semantically 
restricted to bind to infinite sets only. 

Remark 1. It is folklore that languages (without data) accepted by Biichi automata are 
precisely languages expressible in formulae of the form: 

3oo^l • • • ^ooRrnBSi ■ ■ ■ 3Sl if 

for some (p G FO^(-fl). However, we have not found an explicit reference for this result in 
the literature. 

The following theorem is a straightforward generalization of Theorem O 

Theorem 6. Weak Biichi data automata and EaaMSO^{+l, ~) are equivalent in expressive 
power. 

Proof. The translation from an automaton to a formula uses one additional relation symbol 
R which is quantified by an B^o symbol. The formula faccept used in the proof of Lemma 
[3] is then replaced by 

{R{x) ^ V Rq{x)). 

For the opposite translation, it can be checked with the help of the Biichi condition 
that relations quantified with Eloo are indeed infinite. □ 

Theorem 7. The nonemptiness problem for weak Biichi data automata is decidable in 
2-NEXPTIME. 



Proof. We show in the fohowiiig that the nonemptiness problem for WBDA can be poly- 
nomially reduced to the nonemptiness problem for WDA. The result then follows from 
Theorem O The approach is a classical one. We show that if the language of a WBDA 
{A, C) is non-empty then a finite data string of the form uv can be constructed such that 
there is a run of A which loops over v. The "unravelling" uw" is then also accepted by the 
automaton. However, some care is needed to assign data values in a suitable manner. 

Let {A,C) be a WBDA with A = {S,r,Q,qQ,A,F). Since we are only interested in 
whether C'^{A,C) ~ 0, we can assume, without loss of generality, that the transitions of 
A are all of the form (g, 7, g', 7). Otherwise, we can replace it by a transducer which reads 
/^-strings and guesses, for every position i, a symbol a, G S, its profile symbol Si (and 
store them in the state) and verifies that its output would be (the actual input symbol) 
7i. Therefore, we consider A in this proof just as a normal Biichi automaton that gets a 
/^-string as input. The constraints are applied to the same string. 

We first fix some notation. We refer to the symbols that occur in key constraints of C 
as key symbols. 

A zone is a finite data string over F in which all positions carry the same data value. 
An uj-zone is an infinite data string over F in which all positions carry the same data 
value. The zones of a data string w arc the maximal zones of w. An adorned zone is a zone 
together with a pair [q, q') of states of A. We write a-Proj(z) for the triple (Str(z), g, q') of 
a zone z that is adorned with the pair (g, q'). 

We next define an important notion for this proof, (singular and non-singular) wit- 
nesses. We will show that the nonemptiness of {A^ C) boils down to deciding whether such 
witnesses exist. Singular witnesses correspond to data strings in C^{A,C) with an infinite 
zone whereas non-singular witnesses correspond to data strings with finite zones only. 

A singular witness for {A^ C) is a data string uv over F the following properties. 

— WW 1= C. 

— There is a state q £ F and a (partial) run p = puPv of A on input Profile(ww)T in which 
the state after reading u and after reading v is q. Here, Profile(uu)T denotes the profile 
string that is obtained from Profile(uu) by setting the last profile symbol to T. 

— All positions of v and the last zone of u carry the same data value and v does not carry 
any key symbol. 

A non-singular witness for {A, C) is a data string uv over F which fulfills the following 
conditions. 

— All zones in uv are of length at most |Q|(|^| + !)■ 

— The data value of the last position of u is different from the value of the first position 
of V. 

— There is a state q and a (partial) run p ~ PuPv of on input uv in which the state 
after reading u and after reading v is q. Furthermore, p^ contains some state from F . 
In the following, each zone z of w is adorned by the pair (q, q') where q is the state of 
p before reading z and q' is the state after reading z. 



— The classes of uv can be colorecQ with the four colors black, yellow, white and blue such 
that all black, yellow and white classes satisfy all constraints from C and furthermore 
the following conditions hold, 
(black) There are at most SjQp black classes. There arc no key symbols in black zones of 
V. Furthermore, it is not the case that the first zone and the last zone of v are from 
the same black class. 

(yellow) There are at most jQp yellow classes and they consist of at most \r\ zones. All 

these zones arc located in v. 
(white) All zones of the white classes are located in u. 
(blue) For each blue zone z there is a yellow zone 2' such that a-Proj(z)=a-Proj(z'). 

The proof of decidability of the noncmptiness problem for WBDA now reduces to 
proving the following three claims. 

(Claim 1) If there exists a witness for {A,C) then C^^A^C) ^ 0. 
(Claim 2) If £"(^,C) ^ then there exists a witness for (AX)- 

(Claim 3) There is a nondetcrministic algorithm which constructs, for every WBDA {A, C), 
in polynomial time some WDA {A',C') such that every possible {A' ,C') accepts 
only witnesses for (A, C) and for each witness uv there is a run of the algorithm 
producing some {A\C') that accepts uv. 

Therefore, the noncmptiness problem for WBDA can indeed be reduced non-deterministically 
in polynomial time to the nonemptiness problem for WDA. 
Next, we prove Claims 1-3. 

We start by proving Claim 1. Let us assume there is a singular witness uv for {A,C) 
(where all names are chosen as above). It is easy to see that in this case puPy is an accepting 
run of A on input uv'^ and that uw" satisfies all constraints from C. It should be noted 
that in uu'^ all positions in the w^-part and some non-empty suffix of u carry the same 
data value and thus constitute one infinite zone. The repetition of v docs not introduce 
any violations of C as w does not contain any key symbols. 

We next consider the case that uv is a non-singular witness for {A, C) (where again all 
names are chosen as above). In principle, wc aim again for a data w-string in C^{A,C) that 
is obtained from uv by repeating v infinitely often. Indeed, by doing so, we obtain a data 
w-string whose adorned projection is just a-Proj(u)a-ProJ(u)'^. However, the data values 
cannot be the same in every copy of v as otherwise constraints might be violated. 

The basic idea for the assignment of data values is as follows. As white zones only 
appear in u they are not affected and we do not need to adapt them. As the black zones in 
V do not contain any key symbols, we can leave them unchanged in each of the infinitely 
many copies of v that constitute w. It only remains to assign data values to the blue zones 
in u and to the blue and yellow zones in v and the copies of v. To this end, we intuitively use 
the yellow classes as templates. More precisely, we make sure that for every new class that 
is constituted by assigning data values, the set of zones corresponds to one of the yellow 
classes of v, that is, it has the same set of (adorned) zones as that class. This ensures that 
each new class satisfies C. 

* Each class gets exactly one color. We refer to zones and positions in a black class as black zones 

and positions, respectively, and likewise for the other colors. 
^ We do not require that the blue classes satisfy C. 



We now describe the construction of the data cj-string w in more detail. 

— Let wi = uv'^ . Clearly, PuP'^, is an accepting run on Profile(uw"). 

— In the remainder of the construction, only data values are changed, but zone projections 
and runs remain the same. 

— Let W2 be a copy of wi where data values of blue and yellow zones are removed (and 
thus the black and white zones are just as in u and v). As the last zone of v is not from 
the same black class as the first zone of v it cannot happen that two black zones with 
the same data value become adjacent by repeating v. 

— Next, we choose an infinite sequence di,d2, ■ ■ ■ of data values that do not occur in black 
or white zones. We assign data values to the blue and yellow zones in w by repeating 
the following procedure from left to right. In the i-th application we constitute a new 
class by assigning the data value di to a set of zones that corresponds to some yellow 
class. 

• We pick the first yellow or blue zone z that does not yet have a data value. We 
choose a yellow class c of w that contains a zone z' with a-Proj(2:)=a-ProJ(z'). This 
is possible as z is either such a zone itself or it was a blue zone in uv and thus 
such a zone z' exists by the requirements for blue zones. Let z^^, . . . , be the other 
zones of the class c. For each i < k we choose some zone Zi of W2 that has not yet 
received a data value and fulfills a-Proj(z^)=a-Proj(2i). We require that the zones 
z, zi, . . . , Zfc are pairwise not adjacent. As each yellow zone of v is copied infinitely 
often in such zones zi, . . . , Zfc do exist. We assign to z, zi, . . . , Zk the data value 
di. We note that the new class has exactly the same zone profile as the yellow class 
c and therefore satisfies all constraints. 

We denote the resulting data w-word by w. 

It remains to show that indeed w £ C). As p is an accepting run yielding w it 

only remains to show that all classes of w satisfy C. 

• As white classes have not been changed at all, they clearly satisfy C. 

• As the black zones in v do not contain any key symbols, repeating them in w 
does not introduce any violations of key constraints. Otherwise, the repetition does 
not change the set of occurring symbols for any black class and therefore also the 
inclusion and denial constraints remain valid. 

• Each other class of w has the same set of profiles as some yellow class c oiuv. Thus, 
it satisfies all constraints from C just as c does. 

This concludes the proof of Claim 1. 

For the proof of Claim 2 let ^^{A^C) ^ and let p be an accepting run on the data 
cj-word w with w 1= C We consider two cases, depending on whether the number of zones 
in vj is finite or infinite. We first consider the simpler case, in which the number of zones in 
w is finite. In this case w = u'w', for some finite data string u' and an infinite data string 
v' such that 

— all positions of v' have the same data value d, 

— there are no key symbols in v' . 



The latter can be achieved as there is only a finite number of key symbols in the infinite 
zone. 

As p is accepting, some accepting state q occurs infinitely often in the run p ou v' . Let 
u be the prefix of u'v' until (and including) the position of v' after which q occurs for the 
first time. Let v be the substring of v' from the first to the second occurrence of q. Clearly, 
uv is a singular witness for {A^C). 

Now we turn to the case, where w has an infinite number of zones and therefore all zones 
are of finite length. The construction of u and v consists of a number of transformation 
steps of w. We refer to the data string obtained after the i-th transformation step as Wi. 
In these transformations we intuitively view each Wi as an infinite sequence of zones. We 
might replace zones by other zones but we never change the sequence of states that p takes 
on the sequence of zone borders. We call the sub-sequence of states that a run p takes at 
zone borders, the zone sub-run of p 

Thus, our transformations do not change the zone sub-run of our accepting run. Without 
loss of generality, we assume that whenever A assumes a state from q after some position 
p it also assumes an accepting state after the next symbol with profile _L. This can be 
accomplished by an easy modification of A. And now we can be sure that in the zone 
sub-run some infinite state occurs infinitely often. 

The first transformation step transforms w into a data string wi in which each zone 
has length at most -I- 1). This step is applied to each zone z independently. If 

1^1 ^ 1*31(1-^1 + 1)> nothing has to be done. Otherwise, we mark a set of positions of z 
such that the first and last position are marked and, for every symbol 7 that occurs, one 
occurrence is marked. Thus, at most \r\ + 1 positions arc marked. As \z\ > \Q\{\r\ + 1) 
there must be a sequence of at least \Q\ unmarked positions in z. Wc consider the state 
of p before the first of these positions and after each of them. Clearly, in this sequence 
of states some state q must occur twice. By removing the data string between the two 
occurrences of q we obtain a shorter zone z' with the same set of symbols. Furthermore z' 
can be obtained by a partial run of A with the same first and last state as for z in p, thus 
the zone sub-run docs not change. Wc note that by removing symbols no key conflicts can 
be introduced. The repeated application of this process to each zone yields a data word wi 
that is accepted by a run of A with the same zone sub-run as before and for which wi |= C. 

We select, for each class c of wi, and each symbol a that occurs in c, one zone z of 
c that contains a. We call these selected zones the core zones of c and the other zones 
redundant zones. Clearly, each class has at most \r\ core zones and remaining zones do not 
contain any key symbols. Thus, if a redundant zone is removed from a class or a copy of a 
redundant zone is added to a class the validity of constraints is not affected. 

In a nutshell, the remaining transformation steps do the following. First of all, we collect 
all redundant zones in a finite number of classes. These will be the black classes and they 
are the only classes that might have an infinite number of zones. From the remaining classes 
we first distinguish those that contain a zone adornment {q, q') that occurs only a finite 
number of times. These will be the white classes and there are only finitely many of them. 
The remaining classes consist only of core zones with adornments that occur infinitely 
often. We single out a polynomial number of such classes, the yellow classes, that cover all 
"infinite adornments" and in all remaining classes, the blue ones, we replace all zone strings 
by strings from yellow zones, thereby ensuring that there exists only a polynomial number 



of different zone strings outside black and white classes. We now continue the detailed 
description of the construction. 

In the next step, we transform wi into a data string W2 in which at most SjQP classes 
have redundant zones. Thus, in particular, at most SjQp classes have infinitely many zones. 
To this end, we proceed as follows, for every pair {q, q') of states of A. If {q, q') occurs as 
adornment of any redundant zone of wi we pick the (up to) first three classes that contain 
such zones. We color all these classes black. 

Next, wc modify all redundanl[f| zones z that are not (yet) in a black class in a Icft-to- 
right fashion as follows. Let {q,q') be the adornment of such a zone z. As z is not black 
there must be three black classes Ci,C2,C3 with (q, q')- adorned redundant zones ^i, 22,23, 
respectively. We replace z by one of zi , Z2 , Z3 that has a different data value from the 
zones adjacent to z. Although, this step might change the string projections of zones, the 
resulting data string still has the same zone sub-run and is therefore still accepted by A. 
Furthermore, as only redundant zones were removed from non-black classes, these classes 
still satisfy C. And, as in black classes, only copies of redundant zones are added, they also 
still satisfy C. 

We call a state pair {q, q') frequent if it occurs as adornment of infinitely many non-black 
zones, otherwise infrequent. Clearly, there is only a finite number of classes that contain 
zones with infrequent adornment. We color these classes white. 

If all classes arc black or white then the construction is finished. Otherwise, the adorn- 
ment of all zones that are neither black nor white is frequent. 

The main goal of the final transformation step is to reduce the number of different 
string projections that occur in zones that are neither black nor white. We note that this 
transformation step might cause violations of constraints for blue or yellow classes (but this 
does not matter as long as we yield a witness). In the following, we choose three positions 
pi,P2,P3 such that one of them {p S {pi,P2}) will mark the end of u and such that v will 
(basically) be the data string between p and one of the other two {p' € {p2,P3})- We pick 
three such positions to ensure that the condition on the first and last zone of v holds. 

Let q be some accepting state that occurs infinitely often in the zone sub-run. Such a 
state exists by our assumption that the automaton assumes accepting states at the end 
of a zone if it assumed one inside the zone. Let pi be the minimal position in which p2 
assumes q at the end of a zone and such that all white zones are before pi. Wc next choose, 
for each frequent pair (g, q') one class Cq^qi of W2 that is neither black nor white, contains 
a zone with adornment {q,q') and is located after pi. Let p2 be the minimal position in 
which the zone sub-run assumes q and such that all zones of classes Cq^qi are before p2- 

Now, we choose, for each frequent pair (5, g') one class ^, of W2 that is neither black 
nor white, contains a zone with adornment {q,q') and is located after p2- Finally, we let 
P3 be the minimal position in which the zone sub-run assumes q and such that all zones of 
classes Cq^qi are before p-^. 

If the first zone after pi has a different data value than the last zone before p2 or at 
least one of them is not black we set p ^ pi and p' ^ P2- Otherwise, if the first zone after 
P2 has a different data value than the last zone before or at least one of them is not 
black we set p = P2 and p' = ps- Otherwise, we set p = pi and p' = ps- In either case, the 

® We remind the reader that the term redundant is always relative to a class. We note that the 
zones z are redundant in their original class and also in their black target class. 



first zone after p has a different data value tlian tlie last zone before p' or at least one of 
them is not black. 

If p = Pi we color the classes Cq^qi yellow, otherwise we color the classes ^, yellow. 

In the last transformation step, we color all not yet colored zones in blue and furthermore 
modify blue zones as follows. Let z be a blue zone with adornment (g, q'). As z is neither 
white nor black, {q,q') is frequent. Let z' be the (q, q')-adorned zone in Cq,q' (or in ^, if 
p = P2). We keep the data value of z but replace its string projection by Str(z'). This does 
not affect the zone sub-run, but it might cause a constraint conflict for the blue class (but, 
as already mentioned, we need not care about this). 

Let W3 be the resulting data string. 

Now we define u to be the prefix of until position p and v to be the data string from 
(excluding) position p to positioij^ p'. 

This construction guarantees that uv is a non-singular witness for {A,C). This com- 
pletes the proof of Claim 2. 

Finally, we prove Claim 3. As we assume that A copies its input string to the output (and 
thus basically is an automaton) the reduction also constructs WDA with this property. We 
first show how to compute a WDA {A' ,C') for singular witnesses for {A,C). The algorithm 
first guesses a symbol that occurs in the infinite class. 

A' has input alphabet F x {0, 1, 2}. The symbols of the form (a, 1) and (a, 2) are used 
for the class of v and the others for the remaining classes. A' simulates A when it reads 
only the r'-part. Furthermore, it guesses a position p (intuitively: the border between u and 
v) that has a state q from F and verifies that the final state is q as well. For the simulation 
of the last step A' behaves as if the final symbol carried a T-symbol in its profile part. 
A' further checks that all symbols from F x {0} occur before position p — 1 and that no 
symbols (a, 1) or (a, 2), where a is a key symbol of C occur after p. It furthermore checks 
that there is only one occurrence of (ao, 2) and no other (a, 2). All constraints from C are 
reproduced in Ci separately, for symbols of the form (a,0) and (a, 1). Furthermore, they 
ensure that in each class either only (a, 0)-symbols occur or only (a, l)-symbols and (ao, 2) 
(by denial constraints). Finally, there is an inclusion constraint V{{a, 1)) C V{{ao,2)), for 
every a £ _r and a key constraint for (oq, 2), making sure that there is only one class with 
symbols from F x {1}. 

It remains to show how to (non-deterministically) compute a WDA {A\C') for non- 
singular witnesses for {A,C). The basic idea is that the algorithm first guesses the adorned 
zones that are used in yellow classes in the order in which they appear in the witness. 
Furthermore, for each black class, it guesses some symbol a occurring in that class and 
it guesses the order in which these symbols occur. These symbols are colored with black' 
instead of black. We use these non-deterministic guesses in the reduction as otherwise. A' 
would need to handle, e.g., all possible orders in which the yellow zones appear. This would 
result in an exponential blow-up. 



^ Formally, the position p' might have been modified during the last transformation step. However, 
we refer by p' to the last position of the zone that corresponds to the zone that ended in p' in 

W2- 



A' uses the alphabet F x {black, black', yeUow, white, blue} x {0, . . . , 3|(3p}. It reads 
colored symbols and always simulates A on the uncolored projection. It guesses a position 
p and a state q and checks that 

— after position p the run has state q and likewise at the end; 

— between position p and the end, the run assumes some accepting state; 

— white symbols only occur before p; 

— the yellow zones all appear after p and they exactly correspond to the adorned zones 
that were guessed before; 

— each blue zone has the same adorned projection as some yellow zone; 

— white, yellow and blue symbols carry a in their last component; 

— black symbols carry a non-zero number in their last component; 

— there are no black key symbols after p; 

— it is not the case that the first zone after p is in the same class as the last zone and 
that they are both black; 

— each expected black' symbol occurs exactly once and they occur in the expected order. 

In C' the constraints of C are reproduced for the black, yellow and white class. Some 
constraints are added that ensure that in each black class a black'-symbol occurs, similarly 
as for the singular case. Furthermore, in black classes, all symbols have the same number 
in their last component. In this way, it is ensured that for each i G {l,...3|(5p}, there is 
at most one black class. 

Clearly, A' and C' can be computed in polynomial time. The computation is determin- 
istic, once the above mentioned values are guessed. 

This completes the proof of Claim 3 and thus the proof of the theorem. □ 

6 Conclusion 

We conclude this paper with two open problems for future directions. An obvious open 
problem is the exact complexity of the nonemptiness problem for weak data automata. 
The current 2-NEXPTIME yields a 3-NEXPTIME upper bound for the satisfiability prob- 
lem for EMSO^(-f 1, '^). However, as it is known that this problem can be solved in 2- 
NEXPTIME [T^, some room for improvement is left. 

Another interesting question is how our results can be applied to temporal logics. In jlO| . 
a restriction of LTL with one register, simple LTL, was considered with the same expressive 
power as some two variable logic. We conjecture that there is a correspondence between 
our logics and the restriction of simple LTL to the operators X, and an operator that 
allows navigation to some other position. 
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